Associated manuscript: Assessing the calibration of transition probabilities in a multistate model out of the initial state
The first section of this document contains the plots assessing the moderate calibration in the large development sample analysis for the pseudo-value and MLR-IPCW methods in the non-informative censoring (RC) scenario. To showcase each methods ability to appropriately assess non-linear patterns of miscalibration, there is a seperate plot for each method, containing the calibration plots for the perfectly calibrated, over predicting and under predicting transition probabilities. These plots are of the same type as Figure 2 from the main manuscript, but for the pseudo-value and MLR-IPCW methods, as opposed to BLR-IPCW.
Figure S1: Assessment of moderate calibration for the BLR-IPCW approach in scenario RC, large sample analysis
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S2: Assessment of moderate calibration for the pseudo-value approach in scenario RC, large sample analysis
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S3: Assessment of moderate calibration for the MLR-IPCW approach in scenario RC, large sample analysis
The second section of this document contains the plots assessing the moderate calibration in the large development sample analysis for the BLR-IPCW, pseudo-value and MLR-IPCW methods in the weakly and strongly associated censoring scenarios (WAC and SAC). There is a seperate plot for each type of predicted transition probability, where all three methods (BLR-IPCW, pseudo-value and MLR-IPCW) are compared. These plots are of the same type as Figures 3 and 4 from the main manuscript.
Figure S4: Assessment of moderate calibration for each method
Scenario = WAC, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S5: Assessment of moderate calibration for each method
Scenario = WAC, Miscalibrated 1---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S6: Assessment of moderate calibration for each method
Scenario = WAC, Miscalibrated 2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S7: Assessment of moderate calibration for each method
Scenario = SAC, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S8: Assessment of moderate calibration for each method
Scenario = SAC, Miscalibrated 1---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S9: Assessment of moderate calibration for each method
Scenario = SAC, Miscalibrated 2
The mean calibration according to AJ, BLR-IPCW and MLR-IPCW is presented for the perfectly calibrated, and miscalibrated predicted transition probabilities.
Figure S10: Large sample analysis, mean calibration
This section contains the mean calibration plots (median and 2.5 - 97.5 percentile range) for the small sample analysis when patients were grouped into a smaller number of groups (5 and 10) before estimating mean calibration using AJ. Results are also presented for sample size N = 1500, although results could not be obtained for N = 1500 and 20 groups for calibration, as the groups were too small and the Aalen-Johansen estimator could not be estimated.
Figure S11: Small sample analysis. Median and 2.5 - 97.5 percentile range in bias of mean calibration. N = 3000, groups = 10.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S12: Small sample analysis. Median and 2.5 - 97.5 percentile range in bias of mean calibration. N = 3000, groups = 5.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S13: Small sample analysis. Median and 2.5 - 97.5 percentile range in bias of mean calibration. N = 1500, groups = 10.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S14: Small sample analysis. Median and 2.5 - 97.5 percentile range in bias of mean calibration. N = 1500, groups = 5.
The third section of this document contains the plots assessing how robust BLR-IPCW and MLR-IPCW are to misspecification of the weights. We considered four options:
-BLR-IPCW: weights estimated from the data using perfectly specified model, as was done in the main manuscript.
-BLR: no inverse probability of censoring weights were applied in the calibration models.
-BLR-IPCW-DGM: weights were calculated directly from the data generating mechanism, rather than being estimated from the data.
Note that the above is done for both BLR and MLR.
We expect BLR-IPCW-DGM to be optimal. We think the most important comparison is with BLR, which applies no weighting.
OLD DISCISSUON, TO REMOVE: The important conclusions from these figures are that in scenarios WAC and SAC, even when the weights are misspecified (BLR-IPCW-MISS and MLR-IPCW-MISS), there is not a huge drop in performance. If one fails to adjust for weights at all ('BLR' or 'MLR' approaches), there is a considerable drop in performance. This is even true for assessing mean calibration, and is most evident in Figures S25 and S26.
Figure S15: Misspecification of weights, BLR
Scenario = RC, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S16: Misspecification of weights, BLR
Scenario = RC, Miscalibrated 1---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S17: Misspecification of weights, BLR
Scenario = RC, Miscalibrated 2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S18: Misspecification of weights, BLR
Scenario = WAC, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S19: Misspecification of weights, BLR
Scenario = WAC, Miscalibrated 1---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S20: Misspecification of weights, BLR
Scenario = WAC, Miscalibrated 2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S21: Misspecification of weights, BLR
Scenario = SAC, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S22: Misspecification of weights, BLR
Scenario = SAC, Miscalibrated 1---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S23: Misspecification of weights, BLR
Scenario = SAC, Miscalibrated 2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S24: Misspecification of weights, MLR
Scenario = RC, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S25: Misspecification of weights, MLR
Scenario = RC, Miscalibrated 1---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S26: Misspecification of weights, MLR
Scenario = RC, Miscalibrated 2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S27: Misspecification of weights, MLR
Scenario = WAC, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S28: Misspecification of weights, MLR
Scenario = WAC, Miscalibrated 1---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S29: Misspecification of weights, MLR
Scenario = WAC, Miscalibrated 2---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S30: Misspecification of weights, MLR
Scenario = SAC, Perfectly calibrated transition probabilities---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S31: Misspecification of weights, MLR
Scenario = SAC, Miscalibrated 1---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S32: Misspecification of weights, MLR
Scenario = SAC, Miscalibrated 2
We then present sensitivity analyses for the large sample analysis assessment of mean calibration. The plot is the same as that in section 3, except AJ is implemented without grouping individuals by predicted risk, and BLR-IPCW and MLR-IPCW are implemented without inverse probability of censoring weights.
Figure S33: Large sample analysis, mean calibration, sensitivity analysis. AJ implemented without grouping individuals by predicted transition probabilities of state of interest. BLR-IPCW and MLR-IPCW implemented without inverse probability of censoring weights.
We then present sensitivity analyses for the small sample analysis assessment of mean calibration. The plots are the same as section 4, except AJ is implemented without grouping individuals by predicted risk, and BLR-IPCW and MLR-IPCW are implemented without inverse probability of censoring weights.
Figure S34: Small sample analysis, sensitivity analysis. Median and 2.5 - 97.5 percentile range in bias of mean calibration. N = 3000. AJ implemented without grouping individuals by predicted transition probabilities of state of interest. BLR-IPCW and MLR-IPCW implemented without inverse probability of censoring weights.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure S35: Small sample analysis, sensitivity analysis. Median and 2.5 - 97.5 percentile range in bias of mean calibration. N = 1500. AJ implemented without grouping individuals by predicted transition probabilities of state of interest. BLR-IPCW and MLR-IPCW implemented without inverse probability of censoring weights.
This section contains the moderate calibration plot for the clinical example, when using a development dataset of size N = 100,000. The models (N = 5,000 and N = 100,000) were both validated in the same validation dataset of size N = 100,000. The closer grouping of points in the MLR-IPCW calibration scatter plot is evident for the model with development sample size N = 100,000, indicating a better calibrated model.
Figure S36: Moderate calibration according to each method (development sample size N = 100,000)